Back

Journal of Molecular Evolution

Springer Science and Business Media LLC

Preprints posted in the last 90 days, ranked by how well they match Journal of Molecular Evolution's content profile, based on 21 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.

1
Short Interrupted Repeats Cassette (SIRC) ensembles of plant genomes reflects evolutionary route

Gorbenko, I. V.; Scherbakov, D. Y.; Zverintseva, K. M.; Konstantinov, Y. M.

2026-03-30 plant biology 10.64898/2026.03.27.714674 medRxiv
Top 0.1%
6.4%
Show abstract

Short Interrupted Repeats Cassettes (SIRC) are recently discovered eukaryotic DNA elements possessing many traits of satellite DNA and mobile genetic elements, and consisted of short direct repeats interspersed with diverse spacer sequences. The SIRC ensemble of individual species is highly heterogenous and cannot be studied using alignment methods. It was found that number of similar SIRC sequences in a given pair of species is in general correlated with their taxonomic distance, and, at the same time, closely related species can possess very diverged SIRC ensembles, which makes SIRC evolutionary pattern closer to mobile genetic element type. The SIRC sequences make up clusters with comparable sequence patterns, that are likely to demonstrate doublet evolutionary model which strongly supports that the SIRC structure is supported by the evolutionary selection. Several SIRC sequences of Arabidopsis were found to be of ancient origin with traceable evolution history as far as to the moss clade. We carried out unbiased detection of SIRC ensembles in 10 plant genomes and found that, despite very high intraspecies heterogeneity, SIRC sets possess strong interspecies phylogenetic signal. Key messageShort Interrupted Repeats Cassettes are elements of ancient origin, and could potentially be used to trace organism history, and to facilitate syntheny and Hi-C analysis.

2
Early evolution of the prokaryotic transcription factor repertoire

Singh, I. R.; Dubey, A.; Seshasayee, A. S. N.

2026-04-11 evolutionary biology 10.64898/2026.04.08.717362 medRxiv
Top 0.1%
3.9%
Show abstract

Transcription initiation is regulated by proteins called transcription factors (TFs). Though TFs help determine phenotype across the tree of life, they are nonessential for minimal cellular life and are often absent in endosymbiotic and parasitic organisms. Given this and the idea that it is a certain level of organism complexity that calls for specific transcription regulation, we traced the evolutionary history of TF repertoire on a bacterio-archaeal tree of life using a dataset of [~]500,000 TFs, grouped into [~]1,700 orthologous groups (OGs) across [~]3,000 species. The most ancestral prokaryotes encoded multiple TFs. Going by known extant functions of these TFs, they possibly regulated sugar-fermentation metabolism, sensed overall metabolic state and redox, responded to DNA damage or bound metals; many of which are consistent with some reconstructions of ancestral gene pools and physiologies. The number of TFs as well as their superfamily-level diversity, through evolutionary history, matches expectations against genome size derived from extant bacteria, suggesting pre-LUCA diversification of TF sequence families. Emergence of new TFs along the phylogeny largely followed a smooth cumulative distribution curve, suggesting steady innovation, early in prokaryote evolution, in contrast to eukaryotes, in which a majority of TF families emerged in a burst manner at the ancestors of multicellular lineages. Gains of TFs late in prokaryotic evolution predominantly featured recycling of protein families discovered elsewhere in the prokaryotic tree, consistent with the dominance of horizontal gene transfer in these organisms. We speculate on the difference between the evolutionary trajectory of prokaryotic TF repertoire and compare it with the eukaryotic TF repertoire trajectory. This helps us in understanding the manner in which their TF repertoires have evolved in two different super-kingdoms. The difference between the evolutionary dynamics of TF-repertoires might be due to how complexity is envisioned in these two different kingdoms.

3
Genomic indicators of gene function: A systematic assessment of the human genome

Cooper, H. B.; Rojas Lopez, K. E.; Schiavinato, D.; Black, M. A.; Gardner, P. P.

2026-04-09 genomics 10.64898/2026.04.08.717348 medRxiv
Top 0.1%
3.8%
Show abstract

Proteins and non-coding RNAs are functional products of the genome that are central for crucial cellular processes. With recent technological advances, researchers can sequence genomes in the thousands and probe numerous genomic activities of many species and conditions. Such studies have identified thousands of potential proteins, RNAs and associated activities. However there are conflicting interpretations of the results and therefore which regions of the genome are "functional". Here we investigate the relative strengths of associations between coding and non-coding gene functionality and genomic features, by comparing reliably annotated functional genes to non-genic regions of the genome. We find that the strongest and most consistent association between functional genes and genomic features are transcriptional activity and evolutionary conservation. We also evaluated sequence-based statistics, genomic repeats, epigenetic and population variation data. Other features strongly associated with function include histone marks, chromatin accessibility, genomic copy-number, and sequence alignment statistics such as coding potential and covariation. We also identify potential issues with SNP annotations in short non-coding RNAs, as some highly conserved ncRNAs have significantly higher than expected SNP densities. Our results demonstrate the importance of evolutionary conservation and transcription activity for indicating protein-coding and non-coding gene function. Both should be taken into consideration when differentiating between functional sequences and biological or experimental noise.

4
Identification of a Third Period-tuning Site in Cyanobacterial Clock Protein KaiC

Horiuchi, K.; Furuike, Y.; Ito-Miwa, K.; Onoue, Y.; Akiyama, S.

2026-05-14 biochemistry 10.64898/2026.05.11.724173 medRxiv
Top 0.1%
3.1%
Show abstract

KaiC, a clock protein in cyanobacteria, cycles between dephosphorylated and phosphorylated states in a 24-hour period in the presence of KaiA and KaiB. We identified the 322nd residue of KaiC as a third example of period-tuning sites. 322nd-site-directed saturation mutagenesis resulted in a variety of KaiC mutants exhibiting either shortened or lengthened cycles. The tunable range of the periods was from approximately 11 to 78 h without significantly compromising temperature compensation. We conducted biochemical analyses of the 322nd variants and examined their predicted structural models. In contrast to another known period-tuning site, where the period decreases sharply as the side-chain volume increases due to mutations, the cycle lengths correlate only modestly with bulkiness at the 322nd residues. The 322nd residue is located in a C-terminal domain of KaiC and influences ATPase cycles in both the C-terminal domain and an N-terminal domain through its interaction with a flexible loop connecting the two domains. The structural models predict that placing less bulky but polar side chains, such as serine and threonine, at the 322nd position leads to the formation of a hydrogen-bonding network between that site and the loop. This reduces the mobility of the loop, resulting in the longer cycles due to decreases in the ATPase activity of the N-terminal domain. Conversely, placing bulky residues such as phenylalanine at the 322nd position appears to alter the loop structure, shortening the periods by enhancing the ATP activities of both the domains. The third period-tuning mechanism is distinct from other known mechanisms. Significance StatementA Kai-protein clock system serves as a model for studying how long circadian rhythms are achieved. We identified the 322nd residue of KaiC as a third example of period-tuning sites that allow tuning of the period in either long- and short-period directions. The third period-tuning mechanism differs from the two previously known types in several respects. Previous studies have suggested that the ATPase activity in an N-terminal domain of KaiC is the primary regulator of the period. On the other hand, the 322nd residues of KaiC can affect the period by activating the ATPase cycle in its C-terminal domain. Our findings will stimulate future studies on the period-tuning mechanism mediated by the ATPase activity in the C-terminal domain of KaiC.

5
Multiple molecular and cellular properties jointly affect protein and site-specific evolutionary rates

Saini, A.; Usmanova, D. R.; Supo Escalante, R.; Vitkup, D.

2026-05-23 evolutionary biology 10.64898/2026.05.20.726710 medRxiv
Top 0.1%
2.1%
Show abstract

Protein evolutionary rates vary widely across proteins and among sites within proteins, reflecting multiple molecular, cellular, and functional constraints. While protein-level properties, such as expression and essentiality, and site-level structural and functional constraints, are known to influence evolutionary rates, how these constraints combine across scales to determine site-specific evolutionary rates remains unclear. Moreover, because many protein features are strongly correlated, it is difficult to disentangle their individual contributions to evolutionary rate variance, and unified predictive models that integrate these properties are still lacking. Here, we use neural networks to predict protein evolutionary rates across multiple scales based on multiple molecular and cellular features. At the protein level, integrating molecular and cellular descriptors explains substantial variance in evolutionary rates across proteins in multiple eukaryotic species, including nearly 50% of the variance in humans and substantial fractions of the variance in other eukaryotic species. The model also allows us to identify proteins whose evolutionary rates deviate from expectations based on their molecular and cellular properties. At the site level, we found that structural and functional features explain a comparable fraction of the variance in relative evolutionary rates. By integrating protein-level and site-level predictors, the model explains up to 37% of the variance in site-specific evolutionary rates across proteins. Our analysis demonstrates that constraints at these two scales combine largely additively, with protein-level properties setting the overall evolutionary context and site-level properties shaping variation within proteins. Together, these results provide a quantitative framework for understanding protein evolution across biological scales.

6
Unlocking a flexible set of phylogenetic models for discrete and continuous trait evolution using discretized stochastic diffusion

Revell, L. J.; Alencar, L. R. V.; Alfaro, M. E.; Dain, J.; Hill, N. J.; Jones, M.; Martinet, K. M.; Romero-Alarcon, V.; Harmon, L. J.

2026-04-21 evolutionary biology 10.64898/2026.04.20.719455 medRxiv
Top 0.1%
2.1%
Show abstract

The practical utility of many modern phylogenetic comparative methods can depend on how accurately mathematical models capture the evolutionary process of traits. Boucher and Demery (2016) described a new quantitative trait model, Brownian motion with reflective limits, that they anticipated might be of use in testing hypotheses about a particular sort of constraint on phenotypic character evolution. Since their analytic solution for the probability function under this bounded evolutionary scenario was not practical to evaluate for reasonably-sized trees, Boucher and Demery (2016) also identified a creative technique for computing the likelihood of their model. The basis of this methodology derives from the convergence of an equal-rates, symmetric, ordered Markov chain and continuous stochastic diffusion in the limit as the number of steps in our chain goes to {infty} (or, alternatively, as their widths decrease towards zero). We refer to this convergence in the limit as the discretized diffusion approximation or (more compactly) the discrete approximation. We realized that this discrete approximation of Boucher and Demery (2016) unlocked a number of additional models for the phylogenetic comparative analysis of discrete and continuous trait data, and we explore several of these in the present article. Specifically, we examine application of this discretized diffusion approximation to the threshold model from evolutionary quantitative genetics, to a new "semi-threshold" trait evolution model, to a joint model of discrete and continuous traits in which the discrete trait influences the rate of evolution of our continuous character, as well as a model where precisely the converse is true, and to a discrete character dependent multi-trend trended continuous trait evolution model. We conclude with some context for the origins of our article and discussion of other possible applications of this powerful approach.

7
A New Information Theoretic Approach Shows that Mixture Models Outperform Partitioned Models for Phylogenetic Analyses of Amino Acid Data

Ren, H.; Jiang, C.; Wong, T. K. F.; Shao, Y.; Susko, E.; Minh, B. Q.; Lanfear, R.

2026-03-18 evolutionary biology 10.64898/2026.03.16.712229 medRxiv
Top 0.1%
1.9%
Show abstract

Partitioned and mixture models are widely employed in Maximum Likelihood phylogenetic analyses of large genomic datasets. Comparing the fit of the two types of models has been challenging, because standard information-theoretic approaches cannot be applied. Mixture models are increasingly popular for the analysis of amino acid datasets and can lead to different conclusions compared to partitioned models. This raises an important question - which type of model tends to perform better? Susko et al. (2026) recently introduced the marginal Akaike information criterion (mAIC), which allows mixture models and partitioned models to be directly compared for the first time. Here, we use the mAIC and a range of other approaches to compare the fit of mixture and partitioned models across a diverse set of empirical datasets. We show that mixture models are universally favoured on amino acid datasets. This has important implications for interpreting empirical analyses and suggests that continued development of mixture models is an important avenue for future research.

8
Using Variable Window Sizes for Phylogenomic Analyses of Whole Genome Alignments

Ivan, J.; Lanfear, R.

2026-03-06 bioinformatics 10.64898/2026.03.04.709403 medRxiv
Top 0.1%
1.8%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWMany phylogenomic studies used non-overlapping windows to address gene tree discordance across a set of aligned genomes. Recently, Ivan et al. (2025) proposed an information theoretic approach to choose an optimal window size given the alignment. However, this approach selects only a single fixed window size per chromosome, which is a useful first step but fails to account for variation in the size of non-recombining regions along each chromosome. Such variation is expected to occur due to the stochastic nature of recombination as well as the variation in recombination rates along chromosomes. In this study, we extend the approach of Ivan et al. (2025) to allow window sizes to vary across the chromosome, using a splitting-and-merging strategy that allows for each window to be of an arbitrary length. We showed that the new method outperformed the fixed-window approach in recovering gene tree topologies on a wide range of simulated datasets. Applying the new method on the genomes of seven Heliconius butterflies, we found that the average window sizes for the group ranged between 538-808bp, but with a very similar distribution of gene tree topologies compared to previous studies that used fixed window sizes. For the genomes of great apes, the average window sizes ranged from 4.2kb to 6.2kb, with the proportion of the major topology (i.e., grouping human and chimpanzee together) reaching approximately 80%. In conclusion, our study highlights the limitations of using a fixed window size when recombination rates vary across the chromosomes, and proposes a splitting-and-merging approach that allows for variable window sizes across whole genome alignments.

9
DEX: a consensus-based amino acid exchangeability measure for improved codon substitution modelling

Douglas, G. M.; Bobay, L.-M.

2026-03-12 bioinformatics 10.64898/2026.03.09.710665 medRxiv
Top 0.1%
1.8%
Show abstract

Physicochemically similar amino acids undergo more frequent substitutions compared to dissimilar amino acid pairs. Despite their clear potential, amino acid similarity matrices remain underused in molecular evolution, partially due to the high number of proposed amino acid distance measures and the lack of agreement on which are most accurate. In this study, we assessed the performance of 30 amino acid distance measures, including a new amino acid distance measure we developed based on recent deep mutational scanning data. We compared these measures across codon substitution models fit to alignments spanning Streptococcus, Drosophila, and mammalian lineages, as well as segregating variants across Escherichia coli strains and human genotypes. We further constructed consensus measures from combinations of top-performing measures in this analysis using the DISTATIS approach and retested these matrices. Our results show that experimentally-derived measures, particularly our new measure and the existing experimental exchangeability (EX) measure, best fit codon substitution patterns across diverse lineages. We found that a consensus measure based on these two approaches, which we named DEX, performed best overall. In addition, although site-specific variant effect predictors are intended to identify deleterious mutations, the representative tools we tested did not outperform amino acid distance measures for predicting mean substitution frequencies. They were however substantially more informative for identifying individual highly deleterious mutations. Overall, we provide a systematic comparison of the performance of existing measures, and we introduce an improved general-purpose amino acid distance measure for molecular evolution models. SignificanceProtein-coding genes have long been a focus for researchers studying the strength and direction of selection. By studying non-synonymous substitutions, those that change amino acids, it is possible to estimate the relative strength of selection. Despite widespread interest in such approaches, information on which amino acids are exchanged is underused in molecular evolution models. This is partly because many different measures exist for quantifying amino acid distances, particularly those based on physicochemical properties. A newer class of amino acid distance measures is derived from deep mutational scanning datasets, where virtually every possible substitution is tested for its impact on protein function. We characterised and compared 30 amino acid distance measures, including a novel measure based on deep mutational scanning data. We highlight differences in how well these measures fit real substitution and polymorphism datasets. Overall, we find that DEX, which is a consensus of our new measure and an existing experimental exchangeability measure, represents the best available amino acid distance measure to incorporate into molecular evolution models.

10
A Data-Analysis Pipeline for High-Throughput Systematic Evolution of Ligands by Exponential Enrichment (HT-SELEX) in the Characterization of Telomeric Proteins

Williams, J. D.; Tesmer, V. M.; Kannoly, S.; Shibuya, H.; Nandakumar, J.

2026-03-07 biochemistry 10.64898/2026.03.06.710105 medRxiv
Top 0.1%
1.7%
Show abstract

Telomeres are nucleoprotein structures at the ends of eukaryotic chromosomes that safeguard them from triggering inappropriate DNA damage signaling. POT1, a member of the mammalian shelterin complex, binds single-stranded (ss) telomeric DNA and blocks the activation of the ATR kinase-mediated DNA damage response at telomeres. Yet until recently, it was poorly understood how the double-stranded (ds)-ss telomeric junction was protected from DNA damage response factors. An initial study of the DNA-binding activity of human POT1 (hPOT1) using systematic evolution of ligands by exponential enrichment (SELEX) and subsequent investigation revealed that POT1 contains a binding pocket, known as the POT-hole, that binds the 5 phosphorylated dC of the telomeric ds-ss junction. The amino acid residues composing the POT-hole show full sequence identity with telomeric proteins from diverse eukaryotes, including Caenorhabditis elegans POT-1. The current study builds on this SELEX method, developing an extensive analysis pipeline for SELEX datasets sequenced by next-generation sequencing and achieving a deeper analysis of the resulting sequences. We validated our approach by applying it to the DNA-binding domain of hPOT1, yielding results consistent with a previous SELEX study. Furthermore, we employ our pipeline to characterize the DNA-binding activity of C. elegans proteins that are considered homologs of hPOT1: POT-1, POT-2, POT-3, and MRT-1. Our analysis suggests that all four proteins show a binding preference for G-enriched DNA sequences, with POT-1 additionally binding secondary structural elements. Overall, we present a bioinformatics pipeline that is accessible and applicable for determining the nucleic acid-binding properties of a variety of proteins.

11
Genome quality variation across Scyphozoa and the comparative distribution of retinoid- and AhR-related gene families.

Park, Y.-J.; Lee, N.; JO, Y.; Yum, S.; Kwon, K. K.

2026-04-23 evolutionary biology 10.64898/2026.04.22.720242 medRxiv
Top 0.2%
1.7%
Show abstract

Scyphozoan jellyfish have a complex life cycle that includes a characteristic transition known as strobilation. Retinoid signaling has been suggested to be involved in jellyfish metamorphosis and development. However, the genomic basis of signaling pathways associated with metamorphosis has not been sufficiently compared at the class level. Experimental studies have reported that indole compounds can induce metamorphosis in some jellyfish species. Indole- and tryptophan-derived metabolites are known to function as ligands for the aryl hydrocarbon receptor (AhR) in other organisms. However, the potential role of AhR signaling in jellyfish metamorphosis has not been previously explored. We compared the distribution of retinoid- and AhR-associated gene families across multiple scyphozoan genomes. This analysis aimed to characterize their distribution patterns in relation to signaling pathways associated with development and environmental responses. A standard gene prediction and annotation pipeline was applied to 20 species from 21 publicly available scyphozoan reference genome assemblies retrieved from the NCBI database. The distribution and copy number of these gene families were compared across species. Retinoid-associated gene families were detected across almost all Scyphozoa genomes, and core components of AhR signaling (AhR, ARNT) were identified in most species. These results suggest that scyphozoan genomes contain genetic components of retinoid- and AhR-related signals. This study presents the distribution of gene families related to developmental signaling across Scyphozoa using a comparative genomic approach. It does not imply direct functional involvement of retinoid or AhR signaling, but instead focuses on potential signaling pathways at the genome level. It also provides an overview of currently available scyphozoan genomic data. These findings provide a basis for future hypothesis generation and functional validation in jellyfish metamorphosis research.

12
Comparative analysis of transposable elements in jellyfish and hydroid species (Cnidaria: Medusozoa)

Mays, A.; Cabrera, F.; Macias-Munoz, A.

2026-04-21 evolutionary biology 10.64898/2026.04.17.719288 medRxiv
Top 0.2%
1.5%
Show abstract

BackgroundTransposable elements (TEs) are repetitive genetic elements that can jump to new loci causing genome expansions, structural rearrangements, and can, ultimately, propel the evolution of genomes. Despite their significance, the role of TEs in the evolution of genomes and phylogenetic groups remains largely understudied in early diverging lineages. Further, the extent to which TE content varies across species is still an open question. Medusozoa, a group within Cnidaria encompassing jellyfish and hydroids, exhibits an exceptional diversity of life history strategies, body plans, and physiological capabilities. These characteristics, along with its early-diverging phylogenetic position, establish Medusozoa as an ideal system for investigating the composition and evolutionary history of TEs within the group. ResultsWe generated a custom repeat library built from annotations of 25 Medusozoan genomes and used it to characterize TEs, aiming to identify lineage-specific TE content and activity that may correlate with the diversity observed within the group. We found that repetitive element percentage and genome size varied considerably, with Hydrozoa exhibiting the most variation among classes in both respects. DNA transposons were the most prevalent TE classification in all but two genomes, averaging 28% of all genomes. Intra-genus comparisons revealed a surprising degree of differences in TE content. In the genus Aurelia, the expansion of a single DNA transposon superfamily accounted for much of the difference in repetitive element percentage between two species, whereas in the genus Turritopsis, a similar divergence resulted from the proliferation of multiple superfamilies. Interestingly, most genomes showed evidence of recent TE expansions, suggesting ongoing activity in many medusozoan species. ConclusionWe present the first comparative analysis of TEs across all medusozoan classes. Our results reveal class-specific TE dynamics and highlight cases of TE proliferations as lineages diverge. This research provides data on TE activity and diversity that can be used as a resource for future study and fills important gaps in our understanding of TEs in early diverging animal lineages.

13
In silico restriction site analysis of whole genome sequences shows patterns caused by selection and sequence duplications

Vedder, L.; Schoof, H.

2026-05-16 genomics 10.64898/2026.05.15.725336 medRxiv
Top 0.2%
1.5%
Show abstract

Biological sequences are known to be not random. Thus, the comparison of in silico restriction fragment distributions of random and biological sequences may be an indicator of this non-randomness. Our analyses show that for most of the tested combinations of restriction enzyme and genome sequence the fragments per Megabase of the biological sequence deviate at least more then 10% from the corresponding random sequence. This deviation goes into both directions, i.e. clearly increased values are as common as clearly decreased values. Although there is no species- or restriction-enzyme-specific effect, a clear impact of the GC content both of the restriction site and of the genome sequence can be seen. In contrast to the random sequences, the genome sequences show distinct peaks in their fragment length distributions, hinting to repetitive elements such as transposons.

14
Gene family evolutionary dynamics reveal convergent genomic signatures in pancrustacean metamorphosis

Campli, G.; Chipman, A. D.; Waterhouse, R. M.

2026-05-08 evolutionary biology 10.64898/2026.05.06.723392 medRxiv
Top 0.2%
1.5%
Show abstract

Arthropods exhibit an exceptional diversity of life histories, where developmental modes involve moulting stage progressions with changes ranging from the bare minimal to the dramatically transformative. While this variability drives many research questions aiming to understand evolutionary and developmental underpinnings of life history differences, it can complicate comparative analyses across taxa. However, this can be approached by applying a framework that defines metamorphosis as a post-embryonic stage progression characterised by substantial changes in morphology and adaptive landscape. Employing this framework with a phylogenomic dataset spanning 26 orders and encompassing four independently arising metamorphic lineages, we explore gene repertoire evolutionary dynamics potentially associated with metamorphosis in Pancrustacea. The approach contrasts gene family evolutionary dynamics inferred to have occurred in the last common ancestors of the metamorphic Insecta, Copepoda, Eucarida, and Thecostraca, with those of their sister lineages, as well as of descendent and ancestral nodes. The results reveal that the metamorphosis ancestors are characterised by an elevated number of gene family births and expansions. Expanded gene families share a set of commonly enriched biological processes across all metamorphosis ancestors, suggesting functional convergence by independent evolution of distinct gene families involved in embryonic and post-embryonic development and nervous system differentiation. Evolutionary modelling further highlights a subset of these families exhibiting signatures of adaptive, lineage-specific gene family size increases associated with metamorphic development. These families include genes implicated in neural and sensory development, segmentation, and moulting. These findings support a model of the evolution of pancrustacean metamorphosis where distinct gene families from a common functional toolkit expand and are co-opted into facilitating transitions to multi-phasic life cycles. This reframes the role of moulting in arthropod diversification to be recognised as an important reservoir of genetic change that can potentiate truly remarkable life history transitions.

15
Horizontal transfer of an antimicrobial peptide across insects

Aumont, C.; Dhakad, P.; Alff, D. M.; McMahon, D. P.; Hanson, M. A.

2026-03-05 evolutionary biology 10.64898/2026.03.03.709459 medRxiv
Top 0.2%
1.4%
Show abstract

Antimicrobial peptides (AMPs) are key defence molecules of the innate immune system of plants and animals. Understanding the evolutionary origins of AMPs can help to explain how immune systems acquire novelty and vary in their defensive capabilities. However, AMPs evolve rapidly, and so the origins of similar AMPs across organisms is often unclear. Furthermore, false negatives due to low search sensitivity are common and can hinder confident annotations about true absences. Due to these difficulties, understanding whether similar AMP genes found in diverse organisms represent ancestral molecules or evolutionary novelties has been challenging. In this report, we present evidence of horizontal gene transfer (HGT) of the antifungal peptide gene Drosomycin across insects. We show that in Diptera, the presence of Drosomycin is restricted to the Melanogaster group and additionally the distant relative Drosophila busckii. We go on to recover Drosomycin genes in cockroaches (Blattodea), mantises (Mantodea), one katydid (Orthoptera), various beetles (Coleoptera), and a recently acquired pseudogenized Drosomycin locus in Liposcelis booklice (Psocodea), but no other insects. Explaining this diversity through shared ancestry requires at least 50 independent loss events, or just seven HGT events. Previous studies have suggested that similar AMPs found across divergent species reflect conservation from a common ancestor, or due to their small size, that they arose via convergent evolution resulting from pathogen-imposed selection. Our findings suggest horizontal gene transfer can be responsible for the presence of some AMP genes found scattered across the tree of life. By presenting a mechanism through which immune systems can acquire novelty, our study also suggests a possible explanation for certain lineage-specific competencies for defence against infectious disease. While loss of AMP genes is common in certain lineages, here we suggest gain of AMPs can occur just as suddenly.

16
Contrasting Species-Level and Genus Level Disparity Patterns within the ammonoid family Acanthoceratidae

Howard, L.; Wagner, P. J.

2026-03-23 paleontology 10.64898/2026.03.20.713222 medRxiv
Top 0.2%
1.3%
Show abstract

Paleobiologists commonly use genera as a proxy for species in biodiversity studies. However, a lingering concern is that patterns among genera might not always faithfully reflect patterns among species. To date, the concern has focused chiefly on measured patterns of richness over time and on implied origination and extinction rates. However, similar issues might arise for studies of morphological disparity. Moreover, there potentially are additional implications of disparity patterns among species versus those among genera concerning the range of observable anatomical characters and whether disparity within genera is comparable to disparity among genera. If clades have some relatively slowly changing characters that workers have used to denote different genera, then we would expect to see congeneric species to cluster in morphospace; however, if such characters are rare, then within-genus disparity might approach among-genus disparity. Here, we use genus-level and species-level disparity patterns among acanthoceratid ammonoids from the Late Cretaceous. In particular, we examine whether these different level imply different evolutionary dynamics over a major ecological event (Ocean Anoxic Event 2) and how disparity within genera (i.e., among congeneric species) compares to disparity among genera. We find genus-level disparity somewhat inflates early acanthoceratid disparity but implies similar patterns over the OAE2. We also find that within-genus disparity is slightly lower than among-genus, but not hugely so. The combined results suggest that acanthoceratoid shell anatomy does not really show "genus" level characters, even if congeneric species do tend to be more similar to each other than to species in other genera. Thus, this might provide more of a warning for other types of studies using anatomical data (e.g., phylogenetic studies) than for disparity studies. Non-technical SummaryMany paleobiologists use genera to examine scientific questions. This leads to questions over whether this broader approach misses important species-level patterns. This study uses acanthoceratid ammonoids from the Late Cretaceous to examine disparity patterns at both the genus-level and the species-level. We specifically examine the disparity at both levels of this group over a time of high stress for this group, Ocean Anoxic Event 2 (OAE2). Our results show that genus-level disparity slightly exaggerates early acanthoceratid disparity but lowers to a similar pattern to the species-level disparity during OAE2. Within-genus disparity is shown to be slightly lower than among-genus, but not enough to be startling. Together, these results indicate that while some species within the same genus tend to be more alike to each other than those in other genera, there isnt a set of true "genus" level characters. This outcome leads to a warning against using anatomical data in phylogenetic studies, but less so for disparity studies.

17
MauE from Calditrichota and Thermodesulfobacteriota reveal a new pathway for disulfide bond formation in bacteria

Gonzalez, C.; Moilanen, A.; Korhonen, K.; Thu, N. P. A.; Hiltunen, J.; Saaranen, M.; Ruddock, L. W.

2026-03-05 biochemistry 10.64898/2026.03.05.709764 medRxiv
Top 0.2%
1.3%
Show abstract

Disulfide bond formation is crucial to the structure and function of many proteins. It is known that there is diversity in the pathways for disulfide bond formation in bacteria and that there are gaps in our knowledge of these pathways. Using a combination of experimental and bioinformatic approaches we show that some of these gaps can be filled by a newly discovered oxidative folding pathway centered on methylamine utilization protein E (MauE). MauE has previously been associated with the methylamine utilization (MAU) gene cluster, which is involved in methylamine metabolism, in particular it is associated with the maturation of the small subunit of methylamine dehydrogenase. Here we show MauE from Caldithrix abyssi and Desulfatibacillum alphaticivorans functionally replace disulfide bond formation protein B (DsbB) in E. coli using two independent disulfide bond dependent assays. Furthermore, MauE is found in 14 species from 2 bacterial phyla that lack known pathways for structural disulfide bond formation, but which have proteins with structural disulfide bonds in the protein data bank. The active site for MauE was determined to be a conserved CXC motif. Using molecular docking predictions, we demonstrate that MauE is likely to interact with ubiquinone, similarly to the well characterized bacterial DsbB. We also constructed a dataset across thirty-five different phyla to demonstrate that MauE is potentially the second most common disulfide bond formation protein in bacterial disulfide bond formation pathways after DsbB. In addition, the distribution of MauE largely differs from the distribution of other MAU gene cluster markers affirming its role as a newly discovered generalist disulfide bond formation protein rather than being a specialized maturation factor for methylamine dehydrogenase. We also reveal further gaps in disulfide bond pathways, as well as species which may contain redundancies in their disulfide bond pathways.

18
Genes near tRNAs are enriched in translational machinery

West, C.; Dineen, L.; LaBella, A. L.

2026-03-16 bioinformatics 10.64898/2026.03.12.711363 medRxiv
Top 0.2%
1.3%
Show abstract

Transfer RNAs (tRNAs) are known for delivering amino acids to the growing polypeptide chain during translation. They can also influence gene expression, especially in times of nutrient starvation, through differential tRNA expression and modification. tRNAs have a highly consistent cloverleaf structure, but relatively few known regulatory elements govern this conserved structure despite the 20 different standard isotypes. This study examines gene enrichment patterns near tRNA in 1154 fungal genomes. Genes enriched in proteasome regulation, ion transport, and rRNA were found to be significantly closer to tRNAs than other pathways. These results were consistent across KEGG over-representation analysis (ORA), KEGG Gene Set Enrichment Analysis (GSEA), and Gene Ontology (GO) analysis. Proteasome, ion transport, and RNA are all important aspects of protein production and regulation, suggesting that genes required for the synthesis and quality control of proteins, including tRNAs, are located near each other. Protein regulation is an energetically expensive process, and local co-regulation could increase efficiency and stress impacts on proteins.

19
Gene model for the ortholog of Lst8 in Drosophila yakuba

Lawson, M. E.; Sanow, K. A.; Chetana, K.; Taylor, E.; Morgan, A.; Flannery, D.; Elsie, C.; Rele, C. P.; Reed, L. K.; O'Rourke, K. S.

2026-05-14 genomics 10.64898/2026.05.12.723325 medRxiv
Top 0.2%
1.2%
Show abstract

Gene model for the ortholog of Lst8 (Lst8) in the May 2011 (WUGSC dyak_caf1/DyakCAF1) Genome Assembly (GenBank Accession: GCA_000005975.1) of Drosophila yakuba. This ortholog was characterized as part of a developing dataset to study the evolution of the Insulin/insulin-like growth factor signaling pathway (IIS) across the genus Drosophila using the Genomics Education Partnership gene annotation protocol for Course-based Undergraduate Research Experiences.

20
The causes of signed linkage disequilibrium within genomic datasets

Stetsenko, R.; Merot, C.; Glemin, S.; Roze, D.

2026-04-21 genomics 10.64898/2026.04.17.719204 medRxiv
Top 0.2%
1.2%
Show abstract

Several recent studies have quantified signed linkage disequilibrium (LD) among mutations in genomic datasets, often reporting positive LD, particularly among mutations presumed to be less deleterious, such as synonymous variants. In this article, we investigate two potential sources of this positive LD: the focus on rare alleles, as adopted in several previous studies, and errors arising in the mapping of short-read sequences onto a reference genome. Using coalescent simulations, we extend previous theoretical results of the effect of focusing on rare alleles, and show that derived alleles present at similar frequencies tend to be in positive LD. Reanalyzing datasets from Capsella grandiflora and Drosophila melanogaster, we show that LD among synonymous derived alleles vanishes in the absence of any conditioning on frequency, while LD between mutations categorized as potentially deleterious by the SIFT4G program stays positive. However, we show that in both cases, this positive LD may be at least partly caused by the potential mismapping of a small fraction of sequences in some individuals, which could be a consequence of structural variants that are absent from the reference genome. Overall, these results show that average signed LD among mutations can be strongly affected by technical artifacts even if these concern only a minority of variants. Finally, we discuss other possible sources of positive LD among deleterious mutations.